AITopics | exact natural gradient

Collaborating Authors

exact natural gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exact natural gradient in deep linear networks and its application to the nonlinear case

Neural Information Processing SystemsNov-20-2025, 22:31:16 GMT

Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss decreases exponentially to the global minimum in parameter space. Our expression for the natural gradient is surprisingly simple, computationally tractable, and explains why some approximations proposed previously work well in practice. This opens new avenues for approximating the natural gradient in the nonlinear case, and we show in preliminary experiments that our online natural gradient descent outperforms SGD on MNIST autoencoding while sharing its computational simplicity.

exact natural gradient, natural gradient, nonlinear case, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

Reviews: Exact natural gradient in deep linear networks and its application to the nonlinear case

Neural Information Processing SystemsOct-7-2024, 15:22:46 GMT

The main result is that the natural gradient completely removes pathological curvature introduced by depth, yielding exponential convergence in the total weights (as though it were a shallow network). The paper traces connections to a variety of previous methods to approximate the Fisher information matrix, and shows a preliminary application of the method to nonlinear networks (for which it is no longer exact), where it appears to speed up convergence. Major comments: This paper presents an elegant analysis of learning dynamics under the natural gradient. Even though the results are obtained for deep linear networks, they are decisive for this case and suggest strongly that future work in this direction could bring principled benefits for the nonlinear case (as shown at small scale in the nonlinear auto encoder experiment). The analysis provides solid intuitions for prior work on approximating second order methods, including an interesting observation on the structure of the Hessian: it is far from block diagonal, a common assumption in prior work. Yet off diagonal blocks are repeats of diagonal blocks, yielding similar results.

application, natural gradient, nonlinear case, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.37)

Add feedback

Exact natural gradient in deep linear networks and its application to the nonlinear case

Bernacchia, Alberto, Lengyel, Mate, Hennequin, Guillaume

Neural Information Processing SystemsFeb-14-2020, 17:41:26 GMT

exact natural gradient, natural gradient, nonlinear case, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback